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1. INTRODUCTION 

Among children and the new learner, the /gro’ has been proven as an effective method to achieving 
a proper reading skill in Hijaiyah letters which are the fundamental skill for reading the Qur’an [1]-[3]. In 
general, learning with the Jgro’ method requires a face-to-face native mentor (Ustaz or Ustazah) to evaluate 
the accuracy of the student’s pronunciation of the Hijaiah letters. The limitation of time and mentor 
availability made face-to-face learning difficult to obtain. The assessment of a Hijaiyah performance is based 
on different criteria depending on the context and the age of the students. In the case of children and 
beginners, the evaluation criteria are mainly based on Makhraj, the proper pronunciation of the Hijaiyah 
letters. Most of the existing learning apps are either oriented to auxiliary learning tools, or they are designed 
as a one-way learning tool for a student enhancing their vocabulary, writing, and reading ability. Other 
important aspects such as Makhraj or Arabic pronunciation are not taken into account in the scoring process. 


Journal homepage: http://ijece.iaescore.com 


5314 O ISSN: 2088-8708 


Although enhancing vocabulary, writing, and reading ability are necessary, Makhraj is one of the most 
important criteria in the performance assessment of a Hijaiyah reading ability. With the advancement of 
technology for mobile devices, many mobile apps are proposed as a tool to assist the Hijaiyah learning 
process. In the case of common Jqro’ apps, the scoring of Makhraj is mainly based on the proper impost of 
voice which the output only true-false by using the voice recognizer application programming interface (API) 
such as Google Speech-to-Text rather than exact range value from 0 to 100 [4]-[6]. With that condition, the 
improvement level is difficult to trace. 

Some studies have been working in the scientific field of performance analysis of the voice with 
acoustic features as parameters [7]-[10]. Previous research has proposed an automatic singing evaluating 
system based on acoustic features and rhythm [11]. An investigation to determine the effect of pitch and 
rhythm on vocal reading performance indicates that pitch and rhythm skills retained their relative in order to 
evaluate the input voice [12]. On the other hand, Gupta proposes two pitch-based similarity measures to build 
lexical modification techniques to determine how close a user’s singing clip is to the reference singing clip 
with no background music [13]. In another study, Tsai and Lee [14] proposed an automatic evaluation system 
for karaoke singing in another study, emphasizing the use of pitch, volume, and rhythm features. They 
calculated the rhythm score by computing the deviation of the optimal path from a straight line fit in the cost 
matrix of the DTW between the pitch contours after aligning the test pitch contour with the reference pitch 
contour using dynamic time warping (DTW) [14]. The aforementioned studies proved that pitch, volume, and 
rhythm have a good potential to be the scoring parameters. 

This paper presents the usage of the pitch, volume, and rhythm as the more comprehensive audio 
assessment parameters in the voice recognition module to evaluate the student's ability in pronouncing the 
Hijaiyah letter. Next, we investigate the feasibility of the pitch, volume, and rhythm for scoring parameters of 
the Hijaiyah m-learning implementation. The proposed application is expected to introduce Hijaiyah letters 
and guide the users to pronounce them with correct Makhraj pronunciation intuitively. Instead of just 
presenting the true-false output, the proposed will calculate the Makhraj correctness in the range 0 to 100. 
The proposed m-learning provide actual self-learning of Hijaiyah letter to students. 


2. RESEARCH METHOD 

In this section, we present the serial methods to support our goal to examine the user’s perception 
and acceptance of the proposed minimum viable product. Considering contextual problems, voice recognition 
technologies also introduce a change in the conventional Iqro’ reading and pronunciation assessment. The 
development of an m-learning prototype is not simply a matter of technical wizardry. This is due to the fact 
that calculating the acoustic parameter into a single score value only with a smartphone needs a fancy 
algorithm. We have focused on the implementation of the usage of pitch, volume, and rhythm as the Iqro’ 
scoring parameters. The steps for retrospective diary study as part of requirement elicitation for the proposed 
app, concept of the proposed system, and system design strategies are discussed below. 


2.1. Need findings 

We conducted a field study by exploring individual responses about their experience with Hijaiyah 
m-learning. We collected our user’s personas before we conduct the designing phase. A persona is a useful 
tool for describing the user profile of a specific target group, it conveys the relevant demographic, 
psychographic, behavioral, and needs-based attributes [15]. Then, based on insight from respondent data, we 
build a user journey to gather user requirements. A user journey map is a recently emerged method to gather 
requirements and design the user experience in the application product. This map adds a third-dimensional 
feature to a traditional user persona by focusing on a diachronic outline of a user and a product [16]. Recent 
works prove that this approach is an effective tool for rapidly gathering user stories in order to develop an 
intuitive application [17]—[20]. 

The number of participants involved in the field study and testing can be varied. Nielsen et al. [21] 
argue that five individuals will uncover as many usability issues as many test participants. In usability testing, 
five volunteers are sufficient to provide an adequate benefit-to-cost ratio [21]. According to the literature 
studies, we involve five participants in this research to elicit the basic requirement for Iqro’ m-learning. They 
were at least familiar with Android mobile applications with minimum OS version 6. They have basic 
knowledge or experience with [gro’ level 1. From our studies, we have found some insights and barriers 
when reading Hijaiyah letter with an m-learning app, these are: i) respondents need features that can display 
reading scores in 0-100 ratings rather than just true-or-false and ii) respondents need an intuitive app so they 
can directly speak to the app to evaluate their pronunciation. Based on these insights, we proposed an 
intuitive Jqgro’ m-learning that emphasizes the utilization of pitch, volume, and rhythm as a scoring parameter 
that can be used independently by a user only with a smartphone. 
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2.2. The scoring algorithm 

In general, volume, pitch, and rhythm are all directly connected to the accuracy of the pronunciation 
performance. The strength of sound in an audio composition is represented by volume. Pitch refers to the 
relative lowness or highness of a sound. Rhythm, which relates to the timing of the voice sound and silences. 
In 2011, Tsai and Lee [14] proposed a state-of-the-art method to evaluate the performance of karaoke singing 
based on pitch, volume, and rhythm features. This method compares the reference sound with solo vocal 
samples to determine the similarity and scoring of singing performance [14]. This study exploits various 
acoustic features to assess a singing performance, therefore, Tsai’s method stands out from other approaches 
by providing an efficient calculation and giving natural experience when it is implemented in the way of 
interaction with smartphone. To calculate the overall score, the resulting scores from each acoustic feature 
are then combined using a weighted sum method: 


WPit .SPit + WVol .SVol + WRhy.SRhy (1) 


Here, SPit, SVol, and SRhy denote the scores obtained from pitch-based rating, volume-based rating, and 
rhythm-based rating, respectively. WPit, WVol, and WRhy correspond to the adjustable weights that sum up 
to 1. In order to implement the scoring algorithm, the conversion audio source into Waveform and 
spectrogram are needed. To calculate the score of pitch-based rating, volume-based rating, and rhythm-based 
rating we use several processes as shown in Figures I (a) to (c). 

To get a pitch-based rating score, we need to convert the audio source which is the sound file 
recorded by the user into waveform audio format (WAV) format, sampling it, then convert the Ustaz's voice 
file and the user's audio file into a spectrogram. To convert into a spectrogram in this study using the musicg 
library version 1.4.2.0 which already has the fast Fourier transform (FFT) algorithm. After getting the 
spectrogram, the next step is to use the class from PitchHandler on musicg which will later get the value in 
the form of an array. After getting this value, calculate it using the dynamic time warping (DTW) class 
obtained from the GART website to get the distance value. After getting the distance value, the result is 
calculated using the pitch scoring formula. The process of getting pitch-based scoring used the set of 
functions from musicg version 1.4.2 for the audio sampling process, runs the FFT algorithm, calculates the 
spectrogram of the audio file until it returns the value from max Frequency in the form of an array or list for 
the volume array, and gets the value of pitch arrays. The return value of this function is a Pair data type with 
pitch and volume array. This process was implemented 2 times because it is for the user's audio file with the 
audio file for the Ustaz. After getting the array values of pitch and volume for the two audio files, then 
instantiate an object from the DTW for calculating the distance between 2 variables, the distance is an array 
of user pitch with Ustaz and also an array of user volume with Ustaz. After that, call calculate the result to 
obtain the pitch-based score. 

When an audio sound is composed, abbreviations or symbols known as dynamics are notated in 
volume scores to indicate the degree of loudness or softness of a piece of sound, as well as whether or not the 
volume changes. Instead of being absolute, dynamics are relative. The process for obtaining a volume-based 
score is the same as pitch-based scoring. After getting the spectrogram from 2 audio files, then getting the 
value in the form of the maximum frequency array using several functions from the musicg library. Then the 
array is calculated using the DTW to calculate the inbeat value, then calculate it using the volume scoring 
formula. 

The basic idea of rhythm-based rating is to evaluate for the synchronicity (in-beat) between the 
Ustaz reference audio and the user voice sound. After we get a sample of 2 audio files, we conduct feature 
extraction from these 2 audio files to obtain the value of the delta energy in the form of an array. After 
getting the 2 corresponding arrays, it is then calculated using DTW to get the in-beat value. 


2.3. Prototype description 

The study is designed to investigate the feasibility of pitch, volume, and rhythm as the /gro' scoring 
parameters. The Android-based m-learning prototype which was developed in our research aims at providing 
serials learning content of Jgro' and also the quiz and evaluation menu in one application. The proposed 
prototype is using a voice-based approach to take the user voice as input. Since we are developing a mobile 
app as a tool to validate our proposed method to overcome the aforementioned problems, we designed a 
lightweight architecture application design that has a noteworthy feature to display a pronunciation score to 
the user in a 0-100 rating value. This application needs to convert the raw user voice data from the 
smartphone microphone (mp3) into the waveform audio format (WAV). This application uses a default 
microphone to record user voice then automatically compare the result with the pre-loaded Ustaz audio file to 
calculate the overall score with pitch, volume, and rhythm parameters. Technically, we separate the module 
of the proposed app into two modules, the audio converter module, and the computation module. After a user 
voice input is converted, the system performs volume-based rating, pitch-based rating, and rhythm-based 
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rating, using the pre-recorded Ustaz audio file of a specific Hijaiyah letter as a reference basis. Figure 2 
depicts the proposed system architecture. 
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Figure 1. Process algorithm to obtain the rating score of (a) pitch, (b) volume, and (c) rhythm 


In order to display the scoring result to the user, the proposed application needs to convert the raw 
user’s voice to score value. This application uses the aforementioned algorithm with weight product to be 
0.45, 0.16, and 0.39 for Wpitch, Wvolume and Wrhythm respectively [14]. Next, the application calculates the 
overall score and print the result into the application screen. Figure 3 shows the contextual block algorithm of 
the proposed scoring system. 
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Figure 3. Proposed Makhraj-scoring system 
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2.4. Prototype evaluation 

An experiment came out to assess the feasibility of the proposed scoring method using Pearson 
product moment coefficient correlation. The real Makhraj score from Ustaz will be compared with the 
application score result. Faulkner prescribes that 10 participants as a sample will discover as many usability 
issues as more individuals will do. The involvement of 10 contributors is capable enough to give the 
proportional value-to-cost ratio in usability testing [22]. According to the literature studies, we recruited a 
total of 10 student participants in this research. First, we record Ustaz's voices when he read and pronounciate 
the set of Hijaiyah letters with the correct Makhraj, then the voices were inserted into apps as reference 
audio. The participant then explores the function of the apps and conducts the scoring test. We conduct the 
experiment in a quiet environment to minimalize the audio noise. In this study, we focus only investigated 
the student performance when they read the Hijaiyah letter in Igro’s level 1. The sample of implementation 
design and prototype of the proposed m-learning app which was used in this study can be seen in Figure 4. 

This study will calculate the similarity result between the real Makhraj manually scored by Ustaz 
and the application score. Pearson correlation coefficient was used to proceed with this phase [23]. 
According to Ratner, we can get the similarity value with the Pearson correlation coefficient formula as 
shown in (2). 


NYxy-OxX)(Yy) 
VINE- Cx) E y?-(y)?] 


(2) 
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where N is the amount of data, the value of x is the score of the application result, the value of y is the score 
from the Ustaz. The r-value can be — (negative value theoretically), the most important thing is that the 
similarity correlation is in the interval -1 and +1. If the value of r is 0, then it is certain that there is no 
correlation between the two variables, if the value of r=+1, then the two variables are correlated or the 
correlation is perfectly positive because if one variable increases its value, the other variable also increases its 
value linearly. If the value of r=-1 then the two variables are correlated or the correlation of similarity is 
perfectly negative because if one variable increases in value, the other variable decreases in value through a 
linear rule. A value of r= between 0 and 0.3 or on the negative side of 0 and — 0.3 indicates a weak positive 
or negative correlation through the linear rule. If the value of r= between 0.3 and 0.7 or on the negative side 
of -0.3 and -0.7 indicates a moderate or sufficient positive or negative correlation through the linear rule. If 
the value of r= between 0.7 and 1.0 or on the negative side -0.7 and -1.0 indicates a strong positive or 
negative correlation [24]-[26]. 
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Figure 4. App screenshot that used in this study 


3. RESULTS AND ANALYSIS 

As part of the feasibility test of the proposed approach, a direct comparison was conducted to 
explicitly compare the two scoring results, one with the proposed application and another measured manually 
by the Ustaz. Each respondent read and record their pronunciation of the Hijaiyah letter & Í & then we 
compare the overall score result between Ustaz score and app score which is implemented the proposed 
approach in this study. The scoring process used the same acoustic parameters which are pitch, volume, and 
rhythm. The score range for each participant was recorded and the comparison result of the overall score is 
shown in Table 1. 


Table 1. The comparison result of overall score 


Ustaz App 
Respondent name Overall score _ Overall score 
Responden 1 52.4 68.29 
Responden 2 62.2 74.11 
Responden 3 81.65 72.16 
Responden 4 65.45 70.94 
Responden 5 63.05 65.8 
Responden 6 52.9 65.63 
Responden 7 63.8 71.47 
Responden 8 47.7 68.02 
Responden 9 62.3 66.28 
Responden 10 57.7 68.04 
Pearson correlation result 0.51 
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4. CONCLUSION 

In this paper, we emphasize the utilization of acoustic features to proposed alternative Makhraj 
scoring method in Hijaiyah m-learning. To prove our concept, we developed m-learning prototype and direct 
comparison was conducted to evaluate the feasibility of pitch, volume and rhythm as a scoring parameter 
algorithm. Performance of 10 respondents when reading and pronounciate Hijaiyah letter were scored by 
Ustaz manually and application consecutively. By examining the data consistency between the results of the 
proposed m-learning which is using pitch, volume, and rhythm-based features and the subjective judgments 
of Ustaz, the correlation data showed that the proposed system is quite successful of providing an alternative 
scoring method with the Pearson product-moment correlation coefficient between them is 0.51. In the future, 
we will investigate the possibility of tuning the weight product of each feature that used in the computation to 
improve the proposed method performance. Due this study is still in preliminary phase; we also need more 
data set in evaluation process to validate our result for example by emphasize the implementation of the 
proposed method in more advanced level of Jgro’. In addition, we will explore the utilization of another 
acoustic features such as timbre-based analysis to further improve the system algorithm. With further 
analysis and accuracy improvement, the proposed m-learning application could provide a big help in evaluate 
the correct Makhraj of student who is in some conditions that cannot frequently meet with the Ustaz directly. 
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